Day27 - Feature Engineering -- 13. Featuretools - iT 邦幫忙::一起幫忙解決難題，拯救 IT 人的一天

第 12 屆 iThome 鐵人賽

DAY 27

AI & Data

Machine Learning系列第 27 篇

Day27 - Feature Engineering -- 13. Featuretools

12th鐵人賽

tjabi

2020-09-27 23:43:06

3267 瀏覽

分享至

Featuretools

Featuretools 是一個 Python 開放資源 library，他可以自動執行特徵工程。自動特徵工程和其他機器學習主題(topics)一樣，是一個建立在簡單的觀念基礎上的複雜主題。

Featuretools 的主要包括：
Entities and EntitySets
Relationships between tables
Feature primitives: aggregations and transformations
Deep feature synthesis：是一個特徵工程的方法。

Entities and Entitysets
一個entity就是一個表格(table)或一個 Pandas 的 dataframe。
一個entityset就是一個多個關聯性表格的集合。

建立一個entity：

# Entity set with id applications
es = ft.EntitySet(id = 'clients')

定義entity的資料：

# Entities 有一個 unique index
es = es.entity_from_dataframe(entity_id = 'clients', dataframe = clients, index = 'ID')
# Entities 沒有 unique index
es = es.entity_from_dataframe(entity_id = 'clients', dataframe = clients, 
                              make_index = True, index = 'ID')

Relationships
Relationships(關聯性) 就是關聯性資料庫的基本概念。一個多對一 (one-to-many) 關聯性就類似父母(parent) 對子女(child)，一個父母是一個個人，可以有多個孩子，孩子可以有他們自己的多個孩子。在 parent 表格中, 每一個人是一列，在 parent 表格中的每一個人，可以多個孩子(多列) 在 child 表格中。

兩個表格經由共同的變數連接起來對於每個關聯性我們必須說明父母和孩子的變數。

# 建立關聯性
r_client_previous = ft.Relationship(es['clients']['client_id'],
                                    es['loans']['client_id'])

# Add the relationship to the entity set
es = es.add_relationship(r_client_previous)
es

Entityset: clients
Entities:
clients [Rows: 25, Columns: 6]
loans [Rows: 443, Columns: 8]
Relationships:
loans.client_id -> clients.client_id

Feature Primitives

primitive 就是我們通常使用建立特徵的方法。Featuretools 運用這些方法自動建立新的特徵，且可以把他們堆疊起來建立更複雜的特徵。
Feature primitives 包括下列兩種：

Aggregation: 它的功能包括：加總每一個父母下孩子資料，統計出 mean, min, max, or standard deviation。做這些運算時，它運用到表格之間的關聯性。

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['mean', 'max', 'percent_true', 'last'],
                                 trans_primitives = ['years', 'month', 'diff'])

Transformation：應用在一個表個下一個或多個欄位的運算，例如：計算一個欄位的絕對值，或兩個欄位的差。

Deep Feature Synthesis

Deep Feature Synthesis (DFS) 是 featuretools 建立新特徵的過程。它運用 feature primitive 在 entityset 上。

自動特徵工程，建立新的特徵

features, feature_names = ft.dfs(entityset=es, target_entity='clients', 
                                 max_depth = 2)

Day26 - Feature Engineering -- 12. Temporal features(時間的特徵)和Spatial Features(空間特徵)

Day28 - Feature Selection -- 1. Filter methods(過濾器法)

系列文

Machine Learning 共 32 篇

RSS系列文訂閱系列文

22 人訂閱

完整目錄

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

1064 組

團體組數

40 組

累計文章數

22209 篇

完賽人數

600 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# windows server linux css react vue.js

IT邦幫忙